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Abstract. Consider a time series with missing observations but a 

<c ■ 

known final point. Using control theory ideas we estimate/predict 
CO ' these missing observations. We obtain recurrence equations which 

minimize sum of squares of a control sequence. An advantage of 

H ■ 

, this method is in easily computable formulae and flexibility of its 

■ application to different structures of missing data. 
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> ■ 1. Introduction 

CO 

O ! tics going back to earlier works of Bartlett [4], Tocher (28], Wilks [30] . 

O ' Yates [3 1 J and many others (see review paper pQ). There is a large 



Analysis and forecasting missing data is a well-known area of statis- 



number of review papers and books related to this subject, [T], [2], 



43 
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[12], [19] and [20], to mention a few. There are various approaches to 
*i> . missing data, including Bayes methods U\, maximum likelihood, mul- 

^ | tiple imputations methods, methods of non-parametric regression and 



others, e.g. [2], [9], [29]. 

In the present paper we suggest a new method of predicting a special 
class of missing observations in different time series including regression 
and auto-regression models. We suggest a simple recurrence procedure, 
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and to the authors knowledge, it is new and simpler than the compu- 
tational procedures that were known before. 

We study autoregressive time series with missing observations, which 
we propose to predict using a control method. This method is devel- 
oped for different types of autoregressive models including AR(p) mod- 
els in the case of scalar variables and AR(1) in the case of vector-valued 
observations. Forecasting missing data in autoregressive time series has 
received a special attention in the literature: [12], [TJ], [15], [18], [20] . 
[22] . [21] and [27]. The typical approach for forecasting missing data 
in autoregressive models considered in most of papers is based on max- 
imization of likelihood ratio, which can be computationally intensive. 
The approach of the present paper, referred to as a control method 
allows to obtain easily computable formulae for missing data. 

It is known that a one-dimensional AR(p) model can be transformed 
to the special case of a p-dimensional AR(1) model (e.g. Anderson [3]). 
In the present paper we consider both one-dimensional AR(p) models 
and multidimensional AR(1) models nevertheless. The representations 
obtained in the case of one-dimensional AR(p) models are simpler for 
computations than that for a multidimensional AR(1) model. Whereas 
representations for a one- dimensional AR(p) model is recurrence formu- 
lae and can be calculated directly, the computations for a multidimen- 
sional AR(1) model requires two steps. In the first step we calculate 
the vector norm, and then the vector corresponding to a missing value. 

We assume that in the time series: 



(1-1) Xi, Xii X no , X nQ+ i, X no+ 2, %N-1, X N — x 
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the first no observations are known/ observed as well as the last value 
x n = x is assumed to be given too. The values x no +i, x no+2 ,. ■ ■ , 2jv-i 
are missing. 

This set up may have various applications, for example in economics 
and finance, where historical data indices are given, while the last value 
can be obtained from financial derivatives, or might be set externally. 
In finance, for example, on basis of historical volatilities and a future 
value obtained from options one predicts the dynamics of the volatility. 

Although the paper concerns with data structure (jl.ip . the results 
can be extended to different more complicated structures of missing 
values. Indeed, consider for instance the following data 

. . X\, . . . , x no , X no _|_i, • • • , x ni , 

\ ) ~ ^ 

■^ni+l) • ■ ■ ; X n2 , *^7i2+l) ■ • • ) -^N— 1) X TV X. 

Here in (11.21) the data indexed from 1 to Uq and from n\ + 1 to n 2 
are known, the last point x^ = x is assigned, and the rest of data 
are missing. Then we have two groups of missing data, and standard 
decomposition arguments can be used to reduce analysis to that of a 
single group with missing data. 

The paper is structured as follows. In Section [2] we discuss the known 
methods of forecasting missing observation as well as the method of the 
present paper with comparisons. In the following section we discuss 
forecasting missing data by a control method in order of increasing 
complexity. Specifically, in Section [3] we study the problem for the 
simplest AR(1) model of time series, and in Section |4] we extend the 
results for AR(p) models, p > 1. The multi-dimensional observations of 
AR(1) model are studied in Section [51 Then, in Section [6] the problem 
is solved for models of regression. The results of this section are easily 
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understandable and simple. In Section [7] two numerical examples are 
considered in finance and archaeology. 

2. Review of methods for missing values 

There is a large number of papers on estimation and forecasting of 
missing observation in autoregressive models. 

Jones [12] provides the method for calculation of exact likelihood 
function of stationary ARMA time series. The method is based on 
Akake's Markovian representation and application of Kalman's recur- 
sions [5]. An advantage of Kalman's recursions is that the matrices 
and vectors being used in calculations have dimensions max{p, q+1}, 
where p is the order of the auto-regression, and q is the order of mov- 
ing average, rather than dimensions corresponding to the number of 
observations. A non-linear optimization program is then used to find 
the maximum likelihood estimates. 

Kohn and Ansley [IB] study interpolation missing data for non- 
stationary ARIMA models. The likelihood ratio for these models does 
not exist in the usual sense, and the authors define marginal likelihood 
ratio. They show, that marginal likelihood approach reduces in some 
cases to the usual likelihood approachForecasting missing observations 
in is based on a modified Kalman filter, which has been introduced in 
the earlier paper of these authors [T7] . 

Shin and Pantula [26] discuss the testing problem for a unit root 
in an autoregressive model where data are available for each m-th pe- 
riod. The idea is to use characteristic polynomials and properties of 
their coefficients. Under special assumption [26] estimate parameters 
of ARMA(p, p-1) by fitting an ARMA(1, p-1) model. By using a Monte 
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Carlo simulation, the results were compared by those obtained in ear- 
lier papers of Pantula and Hall [21] , Said and Dickey [23] and Shin and 
Fuller [25], who also studied the same testing problem. 

Forecasting in autoregressive models has also been studied by Kharin 
and Huryn [15] and [16]. [15] investigate the case of unknown param- 
eters of an autoregressive model based on the so-called "plug-in" ap- 
proach. The "plug-in" approach consists of two steps: (i) estimation 
of the model parameters by some known approach and (ii) forecast- 
ing, based on estimation of the parameters in the first step. This 
method has lower computational complexity than other methods, such 
as straightforward joint maximum likelihood estimation of the param- 
eters and future values of time series, or Expect at ion- Maximization 
algorithm (e.g. Little and Rubin [20], Jordan and Jacobs [H]). In [16] 
the mean-squared error of maximum likelihood forecasting in the case 
of missing values is obtained for many autoregressive time series. 

The above-mentioned papers [15] and [16] all study a general scheme 
of missing data. Together with vector- valued time series they introduce 
a binary vector characterizing a "missing pattern" but the solution for 
this general formulation is hard to implement in practice. 

The aim of the present paper is prediction (interpolation) of miss- 
ing observations whereas the aim of two above-mentioned papers is 
forecasting in the presence of missing observations, i.e. the forecast- 
ing procedure takes into account missing observations. Furthermore, 
the approach of the present paper deals with specific data structures 
(Section [T]), and can be extended to more complicated structures of 
missing data. In the initial step we use least squares predictors for 
the preliminary extrapolation of missing values. Then, taking into ac- 
count the last known observation we make corrections by formulating 
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and solving a control problem. The control problem is formulated in 
terms of minimization of sums of squares of errors, which in itself is 
a classical approach. However, our method of is based on a novel ap- 
plication of the Cauchy-Schwartz inequality in a simple case, and then 
extended to other more complicated cases. The use of the Cauchy- 
Schwartz inequality is a known technique in optimization, e.g. [8] and 
[TT] . however, in the context of prediction of missing data this method 
seems to be new. In addition, this method yields easily computable 
recurrence formulae for missing values. 

3. A CONTROL METHOD FOR MISSING DATA 

In this section we consider autoregressive time series of the following 
type: 

(3.1) 

The values xi, X2, ■ ■ ■ , x no are assumed to be observed, while by x no +i, 
x no+ 2, . . • , arjv-i we denote estimates of missing observations. The 
value Xn = x is also known. It is convenient to denote this value by 
tilde, i.e. xn = xn = x. 

Theorem 3.1. Best predictors for the missing values are given by 
(3.2) x n = ax n _ 1 + b+— w ^ — -j^— : ■a N ~ n , n = n + l, . . . , N- 1, 

where the coefficients a and b are the least squares solutions of the 
autoregressive equations 

for the first uq observations, n = 1, 2, . . . , n^. 



PREDICTION OF MISSING OBSERVATIONS 7 

Proof. Taking into consideration the first no observed values one can 
build the linear least square predictor as 

(3.3) x n = ax n -i + b 

for n = no+1, %+2, . . . , N, where parameters a and b are the regression 
coefficients. These a and b are then used for control problem, which is to 
find the unknown points, minimizing sum of squares of controls leading 
to the known final value. Namely, for n — no + 1, . . . , N (x no =x no ) 

(3.4) x n = ax n -i + b + u n . 

It can be seen as a correction of the initial linear equation for x n with 
a control sequence u n . The control problem is to minimize the sum of 
squares of controls under the condition that the auto-regression ends 
up at the specified point 

N 

(3.5) min _ V u 2 n . 

u n : x N =x ' ' 

n=no+l 

This minimization problem is solved as follows. By (13. 3p and (13. 4p 

(3.6) u n = (x n x n ) , 
and taking into account that 

N 

x N = x no a N ^+ a N ~ n (b + u n ) 

n=no+l 

and 



N 

N — n n . 7 



-no + b a N ~ n , 

n=no+l 



from (I3.6P we obtain 



N 

(3.7) u% = ( 

\n=no+l 



a N - n u r , 



VYACHESLAV M. ABRAMOV AND FIMA C. KLEBANER 



By the Cauchy-Schwartz inequality, 

(N \ 2 TV N 

n=riQ+l / n=no+l n=no+l 

The equality in ( 13.81) is achieved if and only if a N ~ n = cu n for some 
constant c, and since the equality in ( 13. 8ft is associated with the min- 
imum of the left-hand side of (13.81) . the problem reduces to find an 
appropriate value c = c* such that 

Un = c*a N ~ n . 

Therefore, 

N 

x N = x N + c* a2(7V "°' 

i=n +l 

and then finally for c* we have: 

x N - x N 



(3.9) c* = 

Thus, the sequence u n satisfying (13.51) is 

_ x - x N _ 

Mn 2(Ar _ i) a . 

Z^j=n +l " 

and its substitution for (13.41) yields the desired result (13.21) . □ 

4. Extension of the result for AR(p) model 

Under the assumption that (13. ip is given, we first find the best linear 
predictor for AR(2) model as 

(4.1) x n = aix n _i + a 2 x n -2 + b, 

(n = uq + 2, no + 3+. . . , N), and then extend the result for the general 
case of AR(p) model. 
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Theorem 4.1. For AR(2) model, the best predictor is given by 

(4.2) x n = aix n _i + a 2 x n _ 2 + o + — ^ 

Z^i=n () +2 7jV-i 

where n = tiq + 2, no + 3, . . . , N — 1, and the coefficients j n are as 
follows: 7 n = a n + (3 n , 

a = 1, 

A) = 0, 

a n = ai(a n _i + /3 n _i) (n > 1), 

A = 1, 

/9 n = a 2 (a n _2 + /3„- 2 ) + 1 0>2). 

T/ie coefficients a\, a 2 and b for equation (14.21) are the minimum in 
the least-square sense of the autoregressive equation 

x n — O-l^n-l + 0-2%n-2 + b, 

which are obtained by the first observations. 
Proof. In the case of AR(2) model we have 

(4.3) Xn +2 = aiXn +l + G^no + b, 

and similarly to (13.41) . 

(4.4) x n = a{x n _ x + a 2 x n - 2 + b + u n . 

(n = n + 2,n + 2,. . . ,N). 

Let us now consider the difference x^ — %n — u n- For this difference 
we have the following expansion 

N 

(4.5) U N = ^ jN-nUn, 

n=rao+2 
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with some coefficients Jn-u- Now, the main task is to determine these 
coefficients. Write 7„ = a n + (3 n . Then, using induction we obtain 

a = 1, 

Po = 0, 

(4.6) a n = ai(a n _i + p n -i) (n>l), 
Pi = 1, 

Pn = a 2 (a n _ 2 + Pn-2) + 1 (n > 2). 

Specifically, for the first steps we have the following. Setting N = rio + 2 
leads to the obvious identity = 7o"Uv = («o + Po) u n- in the case 
N = Uq + 3 we have 

u N = u N + (ai + 1)un-i 

= lou N + [ai(a + P ) + 1]«jv-i 
= 7o«v + 7i«v-i- 

In the case N = n + 4 we have 

% = % + ai(7o + + K(ai + 1) + (a 2 + 1)]ujv_ 2 

= IoUn + 7i u Af-i + K(tti + A) + «27o + 1]wjv-2 

= 7o«V + 7l u 7V-l + 72«V-2- 

The next steps follow by induction, and we have recurrence relation 
(1451) - gl| above. 
Therefore, 

TV 

(4.7) U% = I lN-nUn 



\n=TLQ+2 
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and similarly to (13.81) by Cauchy-Schwartz inequality 
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(N \ 2 N N 

n=no+l / n=no+2 n=no+2 

The equality in (14.81) is achieved if and only if jN-n = cw n for some 
constant c, and since the equality in (14. 8 p is associated with the min- 
imum of the left-hand side of (14.81) . the problem reduces to find an 
appropriate value c = c* such that 

U n = C*7Ar_ n . 

This finishes the proof. □ 

The results above are easily extended to general AR(p) models. 
Specifically, we have 

(4.9) x n = aix n -i + a 2 x n - 2 + . . . + a p x n - p + b, 

and 



(4.10) x n = ai£ n _i + a 2 x n -2 + • • • + a p x n _ p + b + u, 
(n = n + p, n + p + 1+. . . , N), and 

N 

(4.11) u N = X N -X N = 22 lN-nUn, 

n=no+p 
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where j n = (a n ,i + «„ i2 + . . . + a„, p ) , and 
ao,i = 1, 

ao,fc = 0, fc = 2,3, . . .,p, 

«n,i = oi (a n _ lj i + a„_i i2 + ... + (n > 1), 

"1,2 = 1, 

ai,k = 0, k = 3,4, . . . ,p, 

a n ,2 = a 2 (a n - 2 ,i + a n -2,2 + ■ ■ ■ + a n -2, P ) (n>2), 

(4.12) ... = ... 

a fc _i jfc = 1, k= 1,2, ...,p- 1, 

afe-i,; = 0, I = k + l,k + 2, . . . ,p, 

a n ,k = a k (a n -k,i + a n -k,2 ■ ■ • + oc n -k,p) (n>k), 

a n , P = a p (a n - Pjl + « n - p , 2 . . . + ot n - PtP ) + 1 (n>p). 
Thus, similarly to (14. 2[) we have the following formula 

(4.13) x n = aix n _i + a 2 x„_ 2 + . . . + a p x n _ p + b+ — w — >y N - n 

l^i=n Q +2 7jV-i 

(n = n + p, n + p + 1 , . . . , N — 1) , where j n are now defined according 

to dm. 

5. Multi-dimensional autoregressive model 

In this section we study a multidimensional version of the problem 
for AR(1). Let 

(5.1) 

x lj x 2j • • • i x n ) x n +l) x n +2; • • • j X JV-1) X N — x - 

For this last value we shall also write xjv = x jv (with tilde). 
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As above, the values xi, x 2 , . . . , x no are assumed to be observed 
values, while x no+ i, x no+2 , . . . , ^n-i are missing observations. 

Taking into consideration only the first n observed values one can 
build the linear least square predictor as 

(5.2) x n = Ax^i + b 

for n = no + 1, no + 2, . . . , N. Here A is a square matrix, and b is a 
vector. 

For n = no + 1, . . . , N (x no =x no ) we find the unknown points by 

(5.3) x n = Ax n _i + b + u n . 

The problem is to find the vectors u n , n = n + 1, . . . , N such that 
they minimize the sum of squares of their lengths subject to the con- 
straint that the auto- regression attains the specified point xjv 

N 

(5.4) min_ V" ||u n || 2 , 

u n : x JV =x Ar * — ' 

n=no+l 

where 



and u n j denotes the jth component of the (A;-dimensional) vector u r 
According to (15.21) and ( 15. 3ft 



(5.5) ||u n || 2 = ||x„,-S n || 2 , 
and for endpoint xat we have 

(5.6) \M\ 2 = 



N 

n=no+l 



Let a\j denotes element (z, j) of matrix A n . We have the following. 



h3 

\N—n 



The ith element of multiplication of A n to vector u n can be written 
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as 



5>S 



(N—n) 



II 



i=l 



where u n j is the jth element of the vector u n . Therefore (15.61) can be 
written as 

k N k 

E E EC- 

, i=l n=n$-\-l 7=1 



U/v 



N 



(57) 



E 



k k 



EE" 

i=i j=i 



(N-n) 

?/ 



\ 



N k 



E E EC 

"="0+1 3=1 \ *=1 



V 



/second term 



first term 

Therefore, by Cauchy-Schwartz inequality, we have 

2 N 



lujvir < 



(5.{ 



JV k 

E E 

,"="0+1 i=i 

N k 

E E 

n=n,Q+l j=l 



.i=i 

E4' 



(N—n) 
3 



(N-n) 
,3 



i=l 



X 



X 



AT k 

E E 

\n=no+l j=l 
N 

Yl IK 

^71=710 + 1 



<<■./ 



The equality in (15.81) is achieved if and only if for some constant c, 



(5.9) 



E 

i=i 



(N-n) _ 
a i,j ~ cu n,j 



and similarly to that of Section [3] the optimal value of this constant c* 
is 



(5.10) 



EN ( 
n=n Q +l <Ej=l 2^i=l a i 



^A; „(N-n) 



2 ■ 
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(5.11) 



u r 



X - Xat 



EN sr^K sr^K (,iv 

l=n +l 2-^j=l 2^i=l a i,j 



k (N-l) 



k k 
j=l i=l 



(N-n) 



Let us now find the vectors u n , n = no + 1, no + 2, . . . , N. From (15.21) 
and (15.31) we have the following: 

N 

(5.12) u N = AN ~ n "n- 

n=no+l 

Therefore for components of the vector ujy we have equations 

N 



(5.13) 



E A (Ar " n) u n , 



n=rto+l 



where A\ N n ^ denotes the ith row of the matrix A N n . Therefore, by 
Cauchy-Schwartz inequality 



N 



U NA 



E A (7V " n) u. 



\n=ng-\-l 
N k 



(5.14) 



E E«S 



(JV-n) 



< 



v,n=no+l j'=l 
N k 

E Eh 

^n=n +l j=l 



JV it 



(JV-n) 



E E< 



s,n=no+l J=l 

where the equality achieves in the case if for some q 



(5.15) 



EHr'f 



Therefore, substituting (15.151) for (I5.13P we obtain: 

una 



(5.16) 



spN a (N-n) 



A 



(N-n) 



T ' 
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6. Models of multi-regression 

Regression models with incomplete data has been studied intensively 
in the literature, and there are many approaches the solution of this 
problem. The theoretical aspect of the present approach seems to be 
new nevertheless. 

1. Consider first the following data: 

,„.,, yi, z/2, y no , yn +i, m-i, VN = y N 

(6.1) 

Xi, X2, X nQ , X no+ i, Xjv-l, X^v. 

As above we use the notation y N = y N . 

We first find the vector a and parameter b by linear least square 
predictor, so for n — no + 1, no + 2,. . . ,N we have 

(6.2) y n = a T x.„ + b. 
We have 

y n = a T x n + b 

= a T x n _i + b + a T (x n , - x n _i) 

= y n -i + b n , 

where b n = a T (x n - x„_i). 
Therefore considering 

(6.3) y n = y n _ x + b n + u n , 

where y no = y m , and tjn = Vn an d the same problem to 

N 



E 



n=no+l 

we arrive at 



(6.4) y n = y n _ x + b n + 



N -n Q 
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2. Let us consider a more extended problem 
(6.5) 

Xi, X2, X nQ , X n[)+ i, Xjv-1, Xj\r, 

where the vectors y of the first row all of dimension m. 
By the linear least square predictor we have 

(6.6) y n = Axn + b. 

Here the vectors y n are of dimension m, the matrix A is of m x k and 
the vector b is of m. We have: 

y n = Ax n + b 

(6.7) = Ax„_x + b + A(x n - x n _i) 

= y«-i + b n , 

where h n =A(x. n — x n _i). 

Let us now consider the equation 

(6.8) y n = y n _i + b n + u n . 
In this specific case we have 

TV 

(6.9) U N = ^2 u "- 

n=no+l 

By the same calculations as earlier (see (15.111) ) we have: 

l|y-yjvll 



(6.10) K 



N -n 



and all the constants q defined in Section [5] are the same. Therefore 



„ .2 _ IK|| 2 



We finally have 

Yn ~9n 



Yn = Yn-1 + b n 



N-n 
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7. Numerical work 

Numerical work of this paper consists of two different parts. The first 
part is related to the case of interpolating missing data in autoregressive 
models. The two numerical results of this part are reflected in Figure 
la and Figure lb. The second part of numerical work is related to two- 
dimensional autoregressive model. The data for this model are related 
to archaeological field and taken from the paper of Cavanagh, Buck 
and Litton [6]. 

7.1. Part 1. The real data of volatility dynamic of IBM company cal- 
culated on the base of the stock information by the method of pU] have 
been used for Figure la. We removed some data from the middle and 
the end of this dynamic and then forecasted missing data by AR(1) 
model for the construction of missing data described by (11.21) . The 
value n is equal to 418, and the corresponding number of missing data 
is 85. Then the value n\ is equal to 1058 and the corresponding number 
of missing data is 106. 

In the second example (Figure lb) we use the volatility dynamic of 
exchange rates of USD and New Israel Shekel. The historical period, 
n , is 1319, and the total length, N, is 1466. Assuming that volatil- 
ity dynamic is AR(1) model, v n+ i = av n + b, then by calculation of 
parameters by linear least square predictor we have a ^0.999576 and 
b ~ 2.652 ■ 10~ 7 . Assuming that volatility dynamics satisfies AR(2) 
model, v n+2 = aif n+ i + a 2 v n + b, we correspondingly obtain a\ « 2.4, 
a 2 ~ —1.4 and b ~ 2.8 • 10~ 7 . As we can see, although the difference 
between these predicted models is small, both curves are visible in the 
graph nevertheless. 
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Figure 1. 

(a) Forecasting missing data of volatility for IBM Co.: 
Blue line - known values, purple line - predicted values 
by AR(1) model; 

(b) Forecasting missing data of volatility for USD-New 
Israel Shekel exchange: Blue line - known values, purple 
line - predicted values by AR(1) model and yellow line - 
predicted values AR(2) model). 
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TABLE 1. Phosphate concentration (the fragment of 
data from [6]). 



7.2. Part 2. In this part we use data from [6|. This is data on Phos- 
phate concentration reflected in Figure 1 (p. 94). There are missing 
data in the fifth and sixth row of these data, and these two rows are 
the rows of Table 1 corresponding to two-dimensional vector x with 
missing data, where the missing data there are indicated by '+'. 

We use a first order autoregressive model in order to predict the 
missing data. We do not provide all intermediate calculations, only 
meaningful results are shown here. 

The filling of these missing data is carried out by two steps. Ac- 
cording to our notation tiq = 10 and n\ = 12. We have xn = 

61.10 \ 

. Next, taking into account the value 

47.98 J 

x 13 = I I , we obtain the following values for Xn and X12 



60.43 


)• 


28.57 




-( 


166 




68 





' 95.10 1 




' 130.43 \ 


Xn = | 




1 M 






l 55.33 I 




l 56.67 / 



Next, according to the accepted notation, iV=16. Similarly, we first 

/ 76.90 \ / 71.45 

find xi 5 = . Then, x i5 = 

\ 62.93 J \ 61.15 

The finally modified table after calculation of missing data is now 

Table 2. 
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Table 2. Phosphate concentration (the finally modified table). 
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